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Abstract The Karmarkar-Karp differencing algorithm is the best known polynomial time heuristic for the number 
partitioning problem, fundamental in both theoretical computer science and statistical physics. We analyze the per- 
formance of the differencing algorithm on random instances by mapping it to a nonlinear rate equation. Our analysis 
reveals strong finite size effects that explain why the precise asymptotics of the differencing solution is hard to establish 
by simulations. The asymptotic series emerging from the rate equation satisfies all known bounds on the Karmarkar- 
Karp algorithm and projects a scaling n~ cla ", where c = l/(21n2) = 0.7213 Our calculations reveal subtle relations 

between the algorithm and Fibonacci-like sequences, and we establish an explicit identity to that effect. 
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1 Introduction 

Consider a list of n positive numbers. Replacing the two largest 
numbers by their difference yields a new list of n — 1 numbers. 
Iterating this operation n — 1 times leaves us with a single num- 
ber. Intuitively we expect this number to be much smaller than 
all the numbers in the original list. But how small? This is the 
question that we address in the present paper. 

The operation that replaces two numbers in a list by their 
difference is called differencing, and the procedure that itera- 
tively selects the two largest numbers for differencing is known 
as largest differencing method or LDM. This method was intro- 
duced in 1982 by Karmarkar and Karp [|Tj as an algorithm for 
solving the number partitioning problem (Npp): Given a list 
a\,a2, ■ ■ ■ ,a n of positive numbers, find a partition, i.e. a subset 
Ac{l »} such that the discrepancy 
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is minimized. Obviously, LDM amounts to deciding iteratively 
that the two largest numbers will be put on different sides of 
the partition, but to defer the decision on what side to put each 
number. The final number then represents the discrepancy. 

Despite its simple definition, the Npp is of considerable 
importance both in theoretical computer science and statisti- 
cal physics. The Npp is NP-hard, which means (a) that no al- 
gorithm is known that is essentially faster than exhaustively 
searching through all 2" partitions, and (b) that the NPP is com- 
putationally equivalent to many famous problems like the Trav- 
eling Salesman Problem or the Satisfiability Problem J2|- In 
fact, the Npp is one of Garey and Johnson's six basic NP-hard 



problems that lie at the heart of the theory of NP-completeness 
J31, and it is the only one of these problems that actually deals 
with numbers. Hence it is often chosen as a base for NP-hard- 
ness proofs of other problems involving numbers, like bin pack- 
ing, multiprocessor scheduling J4|, quadratic programming or 
knapsack problems. The Npp was also the base of one of the 
first public key crypto systems |5j. 

In statistical physics, the significance of the Npp results 
from the fact that it was the first system for which the local 
REM scenario was established |6][7|. The notion local REM 
scenario refers to systems which locally (on the energy scale) 
behaves like Derrida's random energy model |8][9). ^ * s con " 
jectured to be a universal feature of random, discrete systems 
ifTUll . Recently, this conjecture has been proven for several spin 
glass models 111 111121 and for directed polymers in random me- 
dia O- 

Considering the NP-hardness of the problem it is no sur- 
prise that LDM (which runs in polynomial time) will generally 
not find the optimal solution but an approximation. Our initial 
question asks for the quality of the LDM solution to NPP, and 
to address this question we will focus on random instances of 
the Npp where the numbers aj are independent, identically dis- 
tributed (i.i.d.) random numbers, uniformly distributed in the 
unit interval. Let L n denote the output of LDM on such a list. 
Yakir [141 proved that the expectation E [L„] is asymptotically 
bounded by 

n < E [L„\ < n , (2) 
where a and b are (unknown) constants such that 



b> a> 
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21n2 



0.7213. 



(3) 



In this contribution we will argue that b — a= ■ 
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The paper is organized as follows. We start with a compre- 
hensive description of the differencing algorithm, a simple (but 
wrong) argument that yields the scaling (|2]i and a presentation 
of simulation data that seems to violate the asymptotic bound 
(01. In section [3] we reformulate LDM in terms of a stochastic 
recursion on parameters of exponential variates. This recursion 
will then be simplified to a deterministic, nonlinear rate equa- 
tion in section|4] A numerical investigation of this rate equation 
reveals a structure in the dynamics of LDM that can be used as 
an Ansatz to simplify both the exact recursions and the rate 
equation. This will lead to a simple, Fibonacci like recursion 
(section[5]i and to an analytic solution of the rate equation (sec- 
tion |6j. In both cases we can derive the asymptotics including 
the corrections to scaling, and we claim that a similar asymp- 
totic expansion holds for the original LDM. The latter claim is 
corroborated by fitting the asymptotic expansion to the avail- 
able numerical data on LDM. 



2 Differencing Algorithm 



© © o © 




Figure 1. The differencing algorithm in action. 



The differencing scheme as described in the introduction 
gives the value of the discrepancy, but not the actual partition. 
For that we need some additional bookkeeping, which is most 
easily implemented in terms of graphs (Fig. [TJ. The algorithm 
maintains a list of rooted trees where each root is labeled with 
a number. The algorithm starts with n trees of size one and the 
roots labeled with the numbers a,. Then the following steps are 
iterated until a single rooted tree of size n remains: 

1 . Among all roots, find those with the largest (x) and second 
largest (v) label. 

2. Join nodes x and y with an edge, declare node x as the root 
of the new tree and relabel it with x — y. 

After n — 1 iterations all nodes are spanned by a tree whose 
root is labeled by the final discrepancy. This tree can easily be 
two-colored, and the colors represent the desired partition. 



Fig-HJillustrates this procedure on the instance (4,5,6,7,8). 
The final two coloring corresponds to the partition (4,5,7) ver- 
sus (6,8) with discrepancy 2. Note that the optimum partition 
(4,5,6) versus (7,8) achieves discrepancy 0. 

Technically, LDM boils down to deleting items from and 
inserting items into a sorted list of size n. This can be done 
in time (nlnn) using an advanced data structure like a heap 
03) ■ Hence LDM is very efficient, but how good is it? As 
we have already seen in the example, LDM can miss the op- 
timal partition. And for random instances, the corridor (O is far 
above the true optimum, which is known to scale like © {yfn2~ n ) 
J7). Yet LDM yields the best results that can be achieved in 
polynomial time. Many alternative algorithms have been inves- 
tigated in the past [16, 17), but they all produce results worse 
than ©. The few algorithms that can actually compete with the 
Karmarkar-Karp procedure use the same elementary differenc- 
ing operation [ 1 8 , 1 9) . It seems as if the differencing scheme 
marks an inherent barrier for polynomial time algorithms. 

The following argument explains the scaling (|2j. The typ- 
ical distance between adjacent pairs of the n numbers in the 
interval [0,1] is n . Hence after n/2 differencing operations 
we are left with n/2 numbers in the interval [0,n ]. The typi- 
cal distance between pairs is now 2n~ 2 . After another round of 
n/A differencing operations we get «/4 numbers in the range 
[0,8rc~ 3 ]. In general, after 2 k differencing operations we are 

left with n/2 k numbers in the range [0,2(2) n~ k \. Reducing the 
original list to a single number requires k = log 9 n differencing 
operations, and applying the above argument all the way down 
suggests that 

E[L„] (4) 

with 



As we will see, this is the right scaling, yet the argument can- 
not be correct. This follows from the fact that it predicts the 
same scaling for the paired differencing method (PDM). Here 
in each round all pairs of adjacent numbers are replaced by their 
difference in parallel. This method, however, yields an average 
discrepancy of order @(n _I ) li20l . Yet, our analysis below sug- 
gests that and © indeed describe the asymptotic behavior 
correctly, although a far more subtle treatment is required. 

An obvious approach to find the quality of LDM are sim- 
ulations. We ran LDM on random instances of varying size «, 
and Figure |2] shows the results for E[L„]. Apparently lnE[L„] 
scales like ln 2 «, in agreement with (0 and ©. A linear fit 
seems to yield 

c ~ 0.65 

for the constant in (|4), which clearly violates the bound c > 
1/2 In 2. Apparently even « = 10 6 is too small to see the true 
asymptotic behavior. This may be the reason why Monte Carlo 
studies of LDM never have been published. 

A plot of the probability density function (pdf) of L n /E [L n ] 
reveals a data collapse varying values of n (Pig. [3j. Apparently 
the complete statistics of L n is asymptotically dominated by a 
single scale n~ cln ". 

Some technical notes about simulating LDM are appro- 
priate. Differencing means subtracting numbers over and over 
again. The numerical precision must be adjusted carefully to 
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joint distribution of the ratios S^/Sn^i, k= 1, ...,«, is the same 
as that of the order statistics of n i.i.d. uniform variates from 
[0, 1] l23l . As a consequence, LDM will produce the same dis- 
tribution of data no matter whether it is run on uniform variates 
or on Sk/S n+ \. Let L n denote the result of LDM on the par- 
tial sums S\ ,52, ■ ■ ■ ,5„. Since the output of LDM is linear in its 
input, we have 



: 5 n +l-L;j 



(6) 



where 5„ + i is the sum of n + 1 i.i.d. exponential variates and 

the notation X = Y indicates that the random variable X and Y 
have the same distribution. The probability density of 5„+i is 
the gamma density 



_j 
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Figure 2. Results of LDM applied to n random i.i.d. numbers, uni- 
formly drawn from the unit interval. Each data point represents be- 
tween 10 5 (large n) and 10 7 samples (small n). The solid line is the 
linear fit -lnE[L„] = 1.42 + 0.65 ln 2 n. 
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Figure 3. Probability density function of L„/E [L n \. 



support this and to be able to represent the final discrepancy 
of order n~ n . We used the freely available GMP library |2~T1 
for the required multiple precision arithmetic and ran all simu- 
lations on £-bit integers where the number of bits ranges from 
I = 40 (for n = 20) to £ = 300 for n = 1.5 • 10 7 . The integer 
discrepancies were then rescaled by 2 . The pseudo random 
number generator was taken from the TRNG library [22]. 



3 Exact Recursions 

A common problem in the average-case analysis of algorithms 
like LDM is that numbers become conditioned and cease to be 
independent as the algorithm proceeds. Lueker fl2"0l proposed 
to use exponential instead of uniform variates to cope with this 
problem. Let X[,... ,X n+ \ be i.i.d. random exponentials with 
mean 1 and consider the partial sums 

5t =L*=i*i- Then the 



gn+i0) = -r e ' ■ 
«! 

Taking expectations of both sides of (O we get 



E[jy 



E[L„] 



1 



(7) 



(8) 



This allows us to derive the asymptotics of E[L„] from the 
asymptotics of E [L„] . 

Exponential variates are well suited for the analysis of LDM 
because the sum and difference of two exponential variates are 
again exponential variates. Once started on exponential vari- 
ates, LDM keeps working on exponentials all the time. This 
allows us to express the operation of LDM in terms of a recur- 
sive equation for the parameters of exponential densities JT4). 
We start with the following Lemma: 

Lemma 1 Let X\ and X 2 be independent exponential random 
variables with parameter X\ and %i, resp.. The probability of 
the event X\ < Xj_ is given by 



P(Xi <X 2 ) 



(9) 



Furthermore, conditioned on the event X\ < Xi, the variables 
X\ and X2 —X\ are independent exponentials with parameters 
A] +At (forXy) and X^forXi —X\. 

The proof of Lemma [TJ consists of trivial integrations of the 
exponential densities and is omitted here. 

Next we consider generalized partial sums of exponentials, 
described by n-tuples 

(Al , A2, • • • , An) . 

This n-tuple is shorthand for the sequence of partial sums 

(x l ,x 1 +x 2 ,..., , £x i ) 

1=1 

wifhZ i =EXP(A i ). 

Now let us look at the result of one iteration of LDM on 
(Ai,A2, . . . , A„). The two largest numbers are removed and re- 
placed by their difference X n which is an EXP(A„) variate. 
Lemma Q] tells us, that the probability that this number is the 
smallest in the list is 



P(X„<X l ) = 



X n 



At + A„ ' 



4 
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and conditioned on that event, the smallest number is an EXP(Ai 
A„) variate and the increment to the 2nd smallest number X\ — 
X„ is an independent EXP(Aj ) variate. Conditioned on X„ < X\ 
we get another A -tuple as the input for the next iteration: 

+ A«,Al,A2, ... ,A„_2) 

The probability that X„ > X\ is 

A, 



(1,1,1,1) 



P(X«>Xi) 



Ai + A« ' 



and in this case X\ is an EXP(A„ + Ai) variate, whereas the 
difference X n —X\ is an EXP(A„) variate. Now the probability 
that the new number X„ is second in the new list reads 

p (x n > Xi nx„ < Xi + x 2 ) = p (x„ > Xi nx„ -Xi < x 2 ) 

Ai X n 
Ai + A„ At + A„ 

and conditioned on that event the input for the new iteration is 

(Ai + A„, A2 + A„, A2, . . . , A„_2) ■ 

This argument can be iterated to calculate the probability of X„ 
becoming the £-th number in the new list. Denoting the partial 
sums by St we get 



p (X, > nx„ < s k ) = II t^V 

A k + A n ,' = 1 A i + A n 



(10) 



for k = 1 , 
is 



, n — 2 and conditioned on that event the new list 



(Ai + A„, . . . , A* + An, A&, At+i , . . . , A„_2) ■ 



(11) 



The final case is that X„ becomes the largest number in the new 
list. This happens with probability 



p(x„>5, 7 _ 2 )=n Tr A '' 



•_j A; + A„ 

and leads to the list 

(Ai + An,... , A„_2 + An, Are) ■ 



(12) 



(13) 



In all cases we stay within the set of instances given by partial 
sums of independent exponentials, and we can apply Eqs. ( fTOb 
to (13[ recursively until we have reduced the original problem 
to a (Ai,A 2 ) -instance which tells us that the final difference is 
an EXP(A2) variate. 

Fig.|4]shows the result of this analysis on the input (1,1,1,1) 
our original problem with n = 4. We have to explore the tree 
that branches according to the position that is taken by the 
new number inserted in the shortened list. The numbers writ- 
ten on the edges of the tree are the probabilities for the cor- 
responding transition. Note that we have combined the two 
branches emerging from the root that both lead to a (2,2,1)- 
configuration into a single one by adding their probabilities. In 
the end we get 



(2,1,1) 



(2,2,1) 



(3,2) 




(3,2) 



(3,1) 



Figure 4. Statistics of LDM on n = 4. The final difference is dis- 



tributed according to p4 (x) 
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Table 1. Coefficients a "' in l fl4l 



for the probability density function (pdf) of L4. In general, the 
pdf of L n is a sum of exponentials, 



Pn{x) 



E 

k 



al ks 



-lex 



(14) 



where a[ is the probability of LDM returning an EXP(£)- 
variate. For small values of «, this probabilities can be calcu- 
lated by expanding the recursions explicitly (Table Q]), but for 
larger values of n this approach is prohibited by the exponen- 
tial growths of the number K(n) of branches that have to be 
explored. 

Alternatively we can explore the tree of A-tuples by walk- 
ing it randomly. Given a tuple (Ai . . . , A„), we generate a ran- 
dom integer 1 < k < n — 1 with probability 



p{k<e) 



(£<n-l) 
{t = n-\) 
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VE[X 2 ] 

Figure 5. Probability density function of A2/E [A2] . 



and using this random k we generate a new tuple of size n — 1 
according to Eqs. ( fTTT i or ( fT3] l. This process is iterated until the 
tuple size is two, and the final value of A2 is the parameter for 
the statistics of L. The probability density of A2/E [A2] is shown 
in Fig- [5j. Again the data collapse corroborates the claim that 
the statistics of LDM is dominated by a single scale. 



4 Rate Equation 

We can turn the exact recursions from Sec.[3]into a set of rate 
equations for the time-evolution of the average A -tuple. Let Xj 
denote the value of A; after t iterations, such that 



(Ai,A|, . . . ,A^_ ( ) — > (X[ + ,X{ + 



3 r+i > 



(16) 



As explained in Sec. [3] at "time" t a number k, 1 <k<n— 1— t 
is chosen with probability 



P, (k < I) 



1 



wk< c^"- 1 -*). (17) 



(i = n - 1 - /) 



Depending on the choice of k, Eqs. < flTT > and $13[ suggest 
that A/ +1 only takes on one of two possible values. For 1 < i < 
n — t — 1, these are 



X\ + X'„_ t (i<k<n-t-l) 



(1 <k<i) 



(18) 



whereas for i = n—t — 1 , the two values are 



3 r+i 

A n-t-l 



(* = rt-f-l) 

(1 <Jk<n-t-l) 



We introduce the shorthand 



i-l Aj 
7=1 7 /t n-( 



(19) 



(20) 



for the probability of k > i at iteration t. On average, the evolu- 
tion of X\ is given by the rate equation 

x> +l =xu(i-&i) + W+K- t )^'i, (2D 

for all 1 < i < n — 1 — t , and at the upper boundary 

K-rK-X-f (22) 



These equations are defined on the triangular domain < t < 
n— 1,1 < i < n — t. The initial conditions are 



,f=0 . 



1 (l<i<n). 



(23) 



As described in Sec. [3] the process terminates at t = n — 2 with 



A£ 2 characterizing the exponential variate for the final differ- 



1 n — 2 



ence in LDM. Yet, Eq. (|22) for t = n - 2 implies A"~ 1 = Af 
reflecting the final, trivial differencing step, and it will prove 
conceptually advantageous to focus on the asymptotic proper- 
ties of Aj'~ 1 instead. 

Since the rate equation is an approximation to the exact re- 
cursion, we need to check how accurate it is. We have solved 
the rate equations d21H231 l numerically up to n — 5 • 10 6 . Fig. [8] 
shows 

\n(X?- l (n+l)) 



In n 



from the rate equation versus l/ln«. If A" were calculated 
as an average from the exact recursion, it should be equal to 

lnE[L„] 
ln 2 « 

from the direct simulation of LDM. Fig. [S] shows this quantity, 
too. Apparently the error introduced by approximating the ex- 
act recursion by the rate equation vanishes for n — > 00, and our 
conjecture is that the rate equation and the exact recursion are 
asymptotically equivalent. Judging from our numerical studies 
below, see Tab. 12 both asymptotic series have a relative differ- 
ence of size lnlnn/ln («). 

The time to solve the rate equation numerically scales like 
& (« 2 ) , so it is actually more efficient to simulate LDM di- 
rectly, not least because the sampling for the latter can be done 
efficiently on a parallel machine. For analytic approaches, how- 
ever, the rate equation is more convenient. 
The initial probabilities decay exponentially, 



9% = 2" 



(24) 



which implies that only the first values Ai, X2, ■ ■ ■ increase. Ev- 
erywhere else, SPi is essentially zero, and those entries will not 
increase until the first term of ( |2TI ) has copied the values from 
the low index boundary. Hence we expect a "wavefront" of in- 
creased A -values to travel with a velocity one index per time 
step toward the upper boundary, which in turn travels with the 
same velocity towards the lower boundary. As can be seen from 
Fig. [6] this traveling wavefronts of increasing heights are a hall- 
mark of the rate equation for all times t. We will use this intu- 
itive picture for an Ansatz to analyze both the exact recursion 
and the rate equation in the next two sections. 
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Figure 6. Contour plot on a logarithmic scale for the numerical solu- 
tion A? of the rate equations ( 12 l!23t at n = 256. The solution is A? ~ 1 
throughout the entire lower triangle, and it increases monotonically for 
increasing t above that. The solution rises by about a decade between 
each repeat of a band color. Note the ever more-rapid alternation be- 
tween narrowing and widening bands, signifying regions of rapid gain 
interrupted by extended plateaus. The regular banded structure along 
diagonals f — j = const justifies the similarity solution in Eq. ( 138b . The 
only notable exceptions occur in asymptotically diminishing regions 
near i = 1 and t = n/2, 3»/4,7n/8, 





5 Fibonacci Model 



Both the exact recursion and the rate equations yield 



A( +1 



K + K-t 



(25) 



for the lower boundary that we are ultimately interested in. This 
recursion connects the lower and the upper boundaries at ; = 1 
and at i = n — t. Unfortunately, X' _j depends in a complicated 
way on entries of the A -tuple at different times and different 
places. However, Fig. [6] suggests a similarity Ansatz 



which makes the upper boundary readily available: 



i r+i _ 1 1 

Aj — A^ - 

X[ = 1 



(0 < / < 71 — 1) 

(*<o) 



(26) 



(27) 



1 to hold 



Note that we have extended the initial conditions A/ 
for all negative times, too. 

It turns out that one can express the final value A" -1 of 
this recursion in terms of the corresponding values in smaller 
systems, which leads to a simple recursion in n. To derive this 
recursion it is convenient to visualize d27b in terms of paths 
in a right-angled triangle A„ (Fig. [7]). The hypotenuse of A„ 
represents t and ranges from — n+ 1 to n — 1 , the height is « — 1 . 
Let us discuss the basic mechanism for the example n = 8. The 
final recursion reads 

Xj = Xf + A^ , 

and the two terms on the right hand side correspond to two 
paths: one that connects 6 with 7 along the hypotenuse, the 
other connects 5 with 7 along the path that is "reflected" at the 



Figure 7. Proof of the Fibonacci recursion: The number of different 
paths from the leftmost point to the rightmost point in the triangle for 
n is the sum of the number of paths in the corresponding triangle of 
size [n/2] (top) plus the number of paths in the triangle of size n — 1 
(bottom). 

right leg of Ag. In our case a reflected path moves diagonally 
upward until it touches the right leg above point ;. From there 
it moves downward to point i + This peculiar "law of refrac- 
tion" implies that only every second point of the left half of the 
hypotenuse is connected to the right half by a reflected path. 
We can apply the recursion again and write 



XI = 



-A 4 - 



A, 1 



Here we have connected 6 with 5 along the hypotenuse and 
with 3 along a reflected path, and similarly for 5. We iterate 
this path finding process until all paths end on the left half of 
the hypotenuse (negative f)- Here the paths collect the initial 
values X\ = 1, hence X\ equals the number of different paths 
that connect the points — 7 , —5 1 to the point 7 on the hy- 
potenuse. Instead of considering each paths that starts on the 
left half of the hypotenuse separately we let all paths start in 
the leftmost point —7. The rule for path finding then is: if you 
are on an even index, move one unit to the right. If you are 
on an odd index, there are two branches: one to the right, the 
other 45 degrees upward and reflected down to the hypotenuse. 
Obviously, Xj equals the number of different paths that con- 
nects the leftmost point of Ag to the rightmost point according 
to this rules. Let T„(i) denote the number of paths that connect 
the point i with n — 1 in A n . Then we have 



T n (- 



l 



xr 
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Now, starting at — n + 1, we have two choices: move upward 
for a reflection that will take us to point 1 or move along the 
hypotenuse to point —n + 3: 

r„(-n+i) = r„(i)+r JI (-n+3). 

As we can see in Fig. [7] (top), the number of paths from 1 to 
n — 1 is exactly the same as the total number of paths in A n / 2 - 
Hence 

T„(l) = T n/2 (-n/2 + l). 

Similarly, the number of paths from — n + 3 to n — 1 equals the 
total number of paths in a slightly smaller triangle, as can be 
seen in Fig. [7] (bottom). Hence we have 

r„(-n + 3) = T„_ 1 (-n + 2), 

and all three equations yield 

r„(-n+l) = T„ /2 (-n/2 + l)+r„_i(-n + 2). 

The derivation of a corresponding equation for odd values of n 
is straightforward. If we define 

F{n):=T n {-n+\) = X n l -\ (28) 

the recursion for T„ translates into the Fibonacci like recursion 



F(n)=F(n-l)+F([n/2] 
F(l) = l 



(29) 



where [x] refers to the integer part of x. The resulting sequence 
is known as A033485 in [24|. The generating function g(z) = 
L„F(n)z" satisfies the functional equation 



Kz)(l' 



= z+(l- 



;{z 2 ) , 



and is given by 



rfr) = - 2 



(i 



rwi-z 2 *) 



(30) 



(31) 



F(n) can be evaluated numerically for values of n that are larger 
than the values feasible for simulations of LDM or for solving 
the rate equation. The bottleneck for calculating F(n) is mem- 
ory, not CPU time, since n/2 values must be stored to get F(n). 
With 3 GByte of memory, we managed to calculate F(n) for 
n < 6 • 10 8 . We will derive the asymptotic s of F(n) in the next 
section. 

Fig |8] shows F(n) within the same scaling as the simula- 
tions of LDM and the numerical solution of the rate equation. 
Apparently the similarity Ansatz does not capture the full com- 
plexity of the LDM algorithm or the rate equation. Yet it yields 
a very similar qualitative behavior. And in the next section we 
will show that 

lnF(n) _ 1 



lim 



ln 2 n 



21n2 



(32) 




0.05 0.1 
l/ln(n) 

Figure 8. Four models of LDM: Direct simulation (Z = l/E[nL„]), 
rate equation (Z = A" -1 ), the Fibonacci model Z = F(n) from i29\ 
and the similarity solution Z = f(n) of the continuous rate equation, 
given by ( 149b . The dashed line represents ( 150b . All dotted lines are 
numerical fits of the type d51t . 



6 Continuum Limit 



To analyze the rate equations ( 1211123b . it is convenient to con- 
sider the continuum limit for n — » °°. Asymptotically, a con- 
tinuum solution may differ from the discrete problem in cor- 
rections of order 1 jn. As we will see, such corrections are in- 
accessible, as the asymptotic expansion is a series in terms of 
l/ln(n). 

We rewrite Eq. ( f2TT > in terms of discrete differences, 

-x\ = - (a/ - xu) + (a/ - xu+K- t ) &\ ■ (33) 

Setting 



t = sn (0<5<1), 
i = xn (0 < x < 1 — s) , 

H =y(x,s), 



(34) 



we obtain for large « 

y(x,s) = II(x,s) 
where we have set 



d_ d_ 

dx ds 



—y{x,s)+y(l-s 7 s) p5) 
nax 



n(x,s) =exp< l n j d^\na(^,s) 



with 



a(x,s) 



y(x,s) 



< 1. 



(36) 



(37) 



y(x,s)+y(l-s,s) 

The left-hand side of Eq. d35l l. as well as the numerical solu- 
tion of the full rate equations ( |2TI|231 ) displayed in Fig.|6j again 
suggest a similarity Ansatz 



y(x,s) = Y(s-x). 



(38) 
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This Ansatz yields immediately for Eq. ( |35| >: 



o = n(x,s) 



--y'(s-x) + y(2s- 1) 
n 



(39) 



For almost all x > 0, the right-hand side vanishes by virtue of 
II(x,s) — > 0, as indicated by Eq. (l36l l for a < 1 and n — * °o. 
Correspondingly, 17 (x = 1 —s,s) = at the upper boundary, 
which justifies the similarity solution for the continuum limit 
of Eq. ([22}. Yet, 17 (x = 0,s) = 1 for all s, hence we are left 
with 

-r'(s) = yO-i), (40) 

which can be interpreted as the continuous version of ( f2Tb . 
From the initial conditions of the discrete problem in (l23l it 
is clear that y(:c,0) = 1. For the similarity solution, this implies 
that 

y0) = i, (-1 <*<()). (41) 

Integrating d40l i. we formally obtain 



y(s) = r(0)+n / d%y{2%-\). 
Jo 

Thus, we can evaluate the integral for < s < \ to get 

r 



= 1 +ns, 



0<s< 



(42) 



(43) 



We can continue this process for i < s < |, i. e.,0<2i— 1 < j, 
exactly the domain of validity of d43l l. to obtain 



y(j) = y(0) +n I" d§ y(2l; - 1) +n | ^ 7 (2^ - 1), 



\+ns + ^{2s-\) 2 



1 3 

- < s < - 

2 - - 4 



The emergent pattern is best represented by defining 

ft CO = 7(0 . 



l-2 1- *<s< 1-2~ A ' 



(44) 



(45) 



for k = 0, 1,2, . . ., where Eqs. d4lH44t represent = 0, 1 and 2. 
In general, we find that 



y k+ x{s) = y k [\-2- k 

which is solved by 

7k{s) = 



1— 2 _ * 



^7*(2§-l), (46) 



E 



(2> 



i=o;!2(0 
For any n, we are interested in y(s 



1) J 



l)~lim f _ 



■n-\ 



y(\) = \\my k 

k^oa 



j=oj\ 2(2) 



(47) 



hence 



(48) 



which concludes our solution of (|40b - The sum for 7(1) still 
depends on «, hence we define 



/(") = I 



2© 



(49) 



z 


/ 


F 


Af- 1 




Cl 


-1.44 


-1.45 


-1.22 


-1.24 


C2 


-1.00 


-1.42 


-3.06 


-3.86 


C3 


0.72 


1.01 


1.23 


1.55 



Table 2. Parameters for i5ll used in Fig. [8] 



Now /(n) can be evaluated numerically for very large values 
of n. Fig. [8] shows the result for n < 2 2000 . Don't try this at 
home unless you have a computer algebra system. Interest- 
ingly, In /(«)/ln 2 n asymptotically approaches a value that is 
extremely close to 1 /21n2. In fact, an asymptotic analysis (see 
Appendix) reveals 



In [/(«)(« + !)] 



1 



1 



ln 2 « 



1 /lnln2 
2 In 2 Inn \ In 2 
1 /In2 + 41nln2 ln 2 ln2 



ln 2 « V 
lnlnn 1 



21n2 



lnlnn In Inn 1 



(50) 



In n In 2 



ln 2 n 



ln 2 « 21n2' 



which is the dashed line in Fig. [8] The dotted lines are numer- 
ical least square fits of the lnlnn terms of this scaling, i.e., fits 
of the form 



ln[Z(n)(n-\ 


-!)L, 1 


1 /lnln2- 


hi 3\ 


ln 2 n 


21n2 


lnn \ ln2 


+ i) 






^In2+41nln2 


In 2 In 2 \ 




+ ln 2 « 


V 8 


21n2 J 




lnlnn 


In Inn 


In 2 Inn 




Inn 


Cl + , , C9 + 

In n 


1 2 C3 
In n 



(51) 



with values for c; as shown shown in Table |2] Note that the se- 
ries ( |49l as a solution of ( |40b and the first terms of the asymp- 
totic expansion d50l l have been derived independently in the 
context of dynamical systems l25ll . 



7 Conclusion 



The numerical data supports the claim that the complete statis- 
tics of LDM is dominated by a single scale ~ n~ cln ", not just 
the expectation as described in (|2]). The available data is not 
sufficient to pin down the precise asymptotic scaling, however. 
In fact a naive extrapolation of the available data even con- 
tradicts the known asymptotic bound (|3j. With its ff(n\nn) 
complexity, LDM is a very efficient algorithm, but probing the 
asymptotics requires Inn to be large. This discrepancy of scales 
eliminates simulations as a means to study the asymptotics of 
LDM and calls for alternative approaches. 

We have taken a step in the direction of a rigorous asymp- 
totic analysis by mapping the differencing algorithm onto a rate 
equation. The structure seen in the evolution of this rate equa- 
tion (Fig. |6]l suggests a similarity Ansatz d26b . With the help 
of this Ansatz we could reduce the exact recursion in A -space 
to the Fibonacci model (|29l . The asymptotics of this model 
can be calculated, and it agrees with (|2|i and (01. The same 
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Ansatz plugged into the rate equation even allows us to calcu- 
late the first terms of an asymptotic expansion (f50b . Although 
our Ansatz does not yield a proof, the extracted asymptotic 
behavior satisfies all previous constraints and provides a con- 
sistent interpretation of the numerical results. Hence, our rate 
equations pave the way for further systematic investigations. 

We appreciate stimulating discussions George E. Hentschel and Cris 
Moore. S.M. enjoyed the hospitality of the Cherry L. Emerson Cen- 
ter for Scientific Computation at Emory University, where part of this 
work was done. Most simulations were run on the Linux-Cluster TINA 
at Magdeburg University. S.M. was sponsored by the European Com- 
munity's FP6 Information Society Technologies programme, contract 
IST-001935, EVERGROW. 



A Asymptotic Analysis 

To evaluate the series (|49| | we apply Laplace's saddle-point 
method for sums as described on p. 304 of Ref. 11261 , For 



;!2 



the saddle point is determined by D<j>j = <pj — = 0, i. e., 
= Dln(aj) = In or 

n 

(52) 



Hence, we obtain a moving (n-dependent) saddle point at 



./o 



In 7i 
Tn2 



In 2 



KM) 
ln21n« 



1 

Inn 



(53) 



including terms to the order needed to determine /(«) up to 
the correct prefactor. We keep the l/ln(«)-corrections, since 
<pj contains terms like joln(n). In particular, it is 



y't/'-i) 

<t>j =j\nn-\nj\- J -^— -L]n2. 



(54) 



As the saddle point jo is large for large n, we can replace j\ 
by its Stirling-series l26l . Then, we expand around the saddle 
point by substituting j = jo + 7], keeping only terms to 2nd 
order in 77 and those that are non- vanishing for n — ► °°. We find 



In n 1 . . In2 , „, , 
-ln(2w)-— 77(7? + l) + ff(n), 



21n2 

with log-polynomial corrections 



ln« 





ln2" 











1 



■\n z 



Inn 



(55) 



21n2~" \ln(2) J "' \ln(2) J 
We finally obtain for the asymptotic expansion of ( l50l >: 

dr] exp(f 0+I? ) 

f ln 2 n 




exp ■ 



21n2 



(56) 
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